Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #783 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 16 18 +2
Lines 1794 1968 +174
==========================================
+ Hits 1794 1968 +174 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@tomvothecoder @lee1043 @pochedls @chengzhuzhang As it was pointed out last meeting using |
ed5c9f2 to
993ae15
Compare
|
Notes from today's meeting (07/30/25)
|
|
@tomvothecoder @lee1043 @pochedls @chengzhuzhang Option 1: ds.geomask.regionmask("pr", keep="sea", ...)
ds.geomask.pcmdi("pr", keep="land", threshold1=0.5, threshold2=0.8, source=improved_navy_land)Option 2: ds.geomask.mask_land("pr", method="pcmdi", options={"threshold1": 0.5, "threshold2": 0.8, source: improved_navy_land}) |
|
I'd prefer a generic function where you specify the method (i.e., Option 2). I think there must be a way of avoiding passing in the arguments as a def mask_land(vid, method='pcmdi', threshold1=0.5, threshold2=0.8, source=improved_navy_land):
"""
...
Parameters:
------------
vid ...
method ...
threshold1 (float) : [...description...]. Parameter used with 'pcmdi' method (otherwise ignored).
Default value is 0.5.
...
"""
if method == 'pcmdi':
return _pcmdi_land_sea_mask(..., threshold1=threshold1, threshold2=threshold2, ...)
elif method == 'regionmask'
...
else:
raise ValueError('Invalid method supplied...')If the user supplies |
|
@pochedls I could do that but I'm worried it could be a little confusing/overwhelming. Confusing in the sense of no clear indication which method the arguments work for besides being in the docstring. Overwhelming if we end up adding additional methods or arguments for the methods. I could do it similar to how it was done in the regridder methods, where it's a def mask_sea(
self,
data_var: str,
method: str = "regionmask",
criteria: float | None = None,
mask: xr.DataArray | None = None,
output_mask: bool = False,
**options: Any,
): |
I agree with you @jasonb5. It can be confusing and hard for users to validate which parameters apply to which
I learned that this is a "hybrid dispatcher pattern" that is a very common and accepted approach in scientific Python libraries (e.g., Xarray, MatPlotLib, etc.). There is one public API with multiple backends that have different implements. It keeps the main entry point simple by defining the shared parameters upfront with optional arguments passed via I think this is a reasonable approach, as long as we document it well, use strict option checking, and structure the internal dispatch clearly. We already use this pattern in the regridder too. |
|
I am good with whatever way you choose, but I have to admit that it is inconvenient that the regridder arguments are embedded across several docstrings (that users may have trouble locating). In this issue/case, the docstring wouldn't actually be that long (see xr.open_mfdataset for an example of a pretty long docstring). I don't find it confusing that some arguments would apply under some conditions but not others (if it is noted in the docstring). With that said, I'm okay with docstrings being broken up, but I do have a preference for a main mask call (e.g., mask_land) versus two calls based on method (pcmdi or regionmask). |
|
@tomvothecoder Do we have a documentation section for methods not associated with an accessor? |
If you're referring to stand-alone functions, there is a Top-level API Functions section. We might want to split this section to sub-sections at some point, as it is growing pretty long. |
|
@tomvothecoder The function I want to add is not top-level. It's a function in |
I see. If you want to list that function in the docs, we can create a separate section for module-level functions. |
|
@tomvothecoder @lee1043 This can be tested now. I updated the documentation to include the module-level methods. |
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for generating land-sea masks using the regionmask and pcmdi methods. It introduces a new .geomask accessor for xarray Datasets that provides land and sea masking functionality.
- Adds land-sea mask generation with two methods: regionmask (default) and PCMDI
- Introduces a new MaskAccessor class with
.mask_land()and.mask_sea()methods - Refactors grid accessor code to extract reusable functions for grid dataset creation
Reviewed Changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| xcdat/mask.py | New module implementing land-sea mask generation functionality |
| xcdat/regridder/grid.py | Adds add_bounds parameter and dtype fix for Gaussian axis bounds |
| xcdat/regridder/accessor.py | Refactors grid property to use extracted helper functions |
| xcdat/init.py | Imports MaskAccessor to register the .geomask accessor |
| tests/test_mask.py | Comprehensive test suite for mask functionality |
| pyproject.toml | Adds regionmask dependency and package data configuration |
| docs/api.rst | Documents new mask API functions and accessor methods |
| conda-env/dev.yml | Adds regionmask dependency for development environment |
| conda-env/ci.yml | Adds regionmask dependency for CI environment |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
xcdat/mask.py
Outdated
|
|
||
| Generate a land-sea mask using the PCMDI method with a custom high resolution source: | ||
|
|
||
| >>> highres_ds = xcdata.open_dataset("/path/to/file") |
There was a problem hiding this comment.
"xcdata" should be "xcdat" in the example code.
| >>> highres_ds = xcdata.open_dataset("/path/to/file") | |
| >>> highres_ds = xcdat.open_dataset("/path/to/file") |
|
@jasonb5 Can you provide minimum reproducible examples with the two methods, ideally with some test data? This will help expedite the code review process. |
Actually, I can probably just create a script using the xcdat-data repo. Feel free to provide one if you have it though. |
a744528 to
a5407e3
Compare
There was a problem hiding this comment.
@jasonb5 My commit dcd1980 (#783) moves the navy_land.nc file to xcdat/resources so that it doesn't float at the top-level of the directory. It also simplifies the _get_resource_path() function and moves it to the utils.py file (with updated unit tests in test_utils.py)
I left review comments on the code diff for further clarification. Let me know if you agree with the changes. I am still reviewing changes to other files.
Changes
- Move
navy_land.nctoresources/and_get_resource_path()toutils.py - Refactor
_get_resource_path()and expand unit tests - Update
pcmdi_land_sea_mask()docstring and disable time decoding fornavy_land.nc - Adjust
.gitignoreandpyproject.tomlto include resources
EDIT: Explore hosting navy_land.nc file externally and downloading based on user need (my comment here).
|
@jasonb5 @lee1043 The |
@tomvothecoder Agreed, hosting the data somewhere and download only when it is needed sounds a good solution to me. |
I pushed |
- Updated pcmdi_land_sea_mask() docstring and disabled time decoding for navy_land.nc - Replaced bundled navy_land.nc with pooch fetch-and-cache system - Removed mask_get_resource_path() and its tests - Added _data._get_pcmdi_mask_path() with corresponding tests - Updated environment configs to require pooch >=1.8 - Refreshed related docstrings
f1a9d5b to
3413abc
Compare
|
@jasonb5 thank you so much for the work! And thanks to @tomvothecoder and @pochedls as well. Kudos for the team!! |
|
@tomvothecoder @pochedls @lee1043 Thanks all, glad we got this done! Cheers! |
Description
Adds support for generating land-sea mask using the
regionmaskandpcmdimethods.Checklist
If applicable: